Skip to content

psycopg2.OperationalError: server closed the connection unexpectedly #829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RekGRpth opened this issue Dec 21, 2018 · 16 comments
Closed

psycopg2.OperationalError: server closed the connection unexpectedly #829

RekGRpth opened this issue Dec 21, 2018 · 16 comments

Comments

@RekGRpth
Copy link

this code

sql = 'select pg_backend_pid()'

class DAL():
    def __init__(self, uri):
        from psycopg2 import connect
        #from pg8000 import connect
        self.cursor = connect(user='postgres', host='postgres').cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()

def thread():
    db = DAL('')
    i = 10
    import time
    from pprint import pprint
    while i > 0:
        pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1

def process():
    from pprint import pprint
    pprint(dict(function='process:begin'))
    import time
    time.sleep(2)
    pprint(dict(function='process:end'))

def main():
    from pprint import pprint
    pprint(dict(function='main:begin'))
    import threading
    threading.Thread(target=thread, name='mythread').start()
    import time
    time.sleep(2)
    import multiprocessing
    multiprocessing.Process(target=process, name='myprocess').start()
    pprint(dict(function='main:end'))

if __name__ == '__main__': main()

raises following exception

Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "test3.py", line 19, in thread
    pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "test3.py", line 10, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

but with pg8000 everething is ok

@RekGRpth
Copy link
Author

RekGRpth commented Dec 21, 2018

also, with psycopg2 on python2 everething is ok

@dvarrazzo
Copy link
Member

try keeping the connection alive. instead of:

    self.cursor = connect(user='postgres', host='postgres').cursor()

try:

    self.connection = connect(user='postgres', host='postgres')
    self.cursor = self.connection.cursor()

@RekGRpth
Copy link
Author

result is same

@dimandzhi
Copy link

I'm having the same issue. For now as a workaround had to create wrapper around connection object with used methods to hook them. In cursor() method had to do a SELECT 1 "ping" check and reconnect if needed to return a valid working cursor.

@dvarrazzo
Copy link
Member

I've played a bit with this issue, which is reproducible. Using the following script:

import time
import threading
import multiprocessing
from pprint import pprint
from psycopg2 import connect

sql = 'select pg_backend_pid()'


class DAL():
    def __init__(self, uri):
        self.conn = connect("")
        self.cursor = self.conn.cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()


def thread():
    db = DAL('')
    i = 3
    while i > 0:
        pprint(dict(
            function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1


def process():
    pprint(dict(function='process:begin'))
    time.sleep(2)
    pprint(dict(function='process:end'))


def main():
    pprint(dict(function='main:begin'))
    t = threading.Thread(target=thread, name='mythread')
    t.start()
    time.sleep(1)
    multiprocessing.Process(target=process, name='myprocess').start()
    t.join()
    pprint(dict(function='main:end'))


if __name__ == '__main__':
    main()

The script has the following output, on Py 2.7 and 3.6:

$ PYTHONPATH=build/lib.2.7/ python ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:end'}
{'function': 'main:end'}
$ PYTHONPATH=build/lib.3.6/ python3 ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16956', 'function': 'thread:run'}
{'executesql': '16956', 'function': 'thread:run'}
{'function': 'process:begin'}
Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "./test-829.py", line 25, in thread
    function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "./test-829.py", line 16, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


{'function': 'main:end'}
{'function': 'process:end'}

Saving the output of strace in two different files:

PYTHONPATH=build/lib.2.7/ strace -f python -u ./test-829.py > out-829-py2 2>&1
PYTHONPATH=build/lib.3.6/ strace -f python3 -u ./test-829.py > out-829-py3 2>&1

after making the pid numbers uniform in the files (P0, P1, P2), extracting the relevant part (omitting what's before the connection to the db):

python 2 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py2
clone(child_stack=0x7fd47e85afb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd47e85b9d0, tls=0x7fd47e85b700, child_tidptr=0x7fd47e85b9d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(6)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone( <unfinished ...>
[pid  P0] <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd485bfda10) = P2
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P1] close(3 <unfinished ...>
[pid  P1] <... close resumed> )       = 0
close(5)                                = 0

python 3 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py3
clone(child_stack=0x7f4662f92fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f4662f939d0, tls=0x7f4662f93700, child_tidptr=0x7f4662f939d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(4)                    = 0
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
[pid  P1] <... sendto resumed> )      = 29
[pid  P0] close(4)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f466a434a10) = P2
[pid  P0] close(5)                    = 0
[pid  P2] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P2] close(3)                    = 0
[pid  P2] close(4)                    = 0
[pid  P2] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/usr/lib/python3.6/threading.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "./test-829.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
psycopg2.OperationalError: server closed the connection unexpectedly
[pid  P1] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0

I see here that on Python 3, the process P2, politely, closes the connection belonging to P1 (the X command in the sendto() call) 😳

Similar behaviour can be seen in a run with debug enabled: on Python 3 the trace would be something like:

[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] pq_execute: entering synchronous DBAPI compatibility mode
[P1] pq_fetch: pgstatus = PGRES_TUPLES_OK
[P1] pq_fetch: got tuples
[P1] _pq_fetch_tuples: looking for cast 23:
[P1] curs_get_cast:        per-connection dict: (nil)
[P1] curs_get_cast:        global dict: 0x..
[P1] _pq_fetch_tuples: using cast at 0x.. for type 23
[P1] psyco_curs_execute: res = 1, pgres = 0xP2
[P2] conn_close: PQfinish called
[P1] _psyco_curs_prefetch: result = 0
[P2] connection_dealloc: deleted connection object at 0x7f9e15f28180, refcnt = 0
[P1] _psyco_curs_buildrow: row 0, element 0, len 5
[P2] cursor_dealloc: deleted cursor object at 0x7f9e161069f8, refcnt = 0
[P1] _psyco_curs_buildrow: val->refcnt = 1
{'executesql': 'P3', 'function': 'thread:run'}
{'function': 'process:begin'}
[P1] psyco_curs_execute: starting execution of new query
[P1] pq_execute: pg connection at 0xP1 OK
[P1] pq_begin_locked: pgconn = 0xP1, autocommit = 0, status = 2
[P1] pq_begin_locked: transaction in progress
[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] psyco_curs_execute: res = -1, pgres = (nil)

So, on Py3 it seems the objects undergo GC, which for the connection sends a close command to an open FD.

Maybe the thing can be fixed by making sure deleting the object doesn't call PQfinish if the pid is not the one the connection was created into.

dvarrazzo added a commit that referenced this issue Mar 17, 2019
Unrelated processes close the FD of the connection. This happens in
Python 3.6 but not 2.7. Let's see if travis shows where else it fails...
@leongold
Copy link

leongold commented Oct 8, 2019

Still hitting this issue @ 2.8.3... heavy multiprocessing use. Any particular info you'd like? Any hints? will temporarily disabling GC help?

@dvarrazzo
Copy link
Member

@leongold This bug is fixed. You are not hitting this bug, maybe another one. Please provide a test to reproduce your problem, thank you.

@yshahak
Copy link

yshahak commented Oct 10, 2019

I'm getting this as well on 2.8.3. I'm doing multiprocessing of insertions. I have about 4000 to do and at about 600 I'm getting this:
psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

@dvarrazzo
Copy link
Member

@yshahak This might even mean that your server closed the connection. Did you check in the server logs if that's what happened?

This bug is about an issue for which on python 3 processes closed each other's connections. If you produce me an example demonstrating it's the client fault, then we can do something. Missing it I assume it's exactly what the message says: the server closed your connection and you can't do much about it client side.

@leongold
Copy link

@dvarrazzo I'm not entirely sure whether the test case fits exactly. I'll try to find the time to write my own test case, however in the meanwhile:

  • I instantiate a connection object early in the subprocess' lifecycle. It doesn't do anything for a while.
  • When it's time to INSERT, I hit the aforementioned error (server closed the connection unexpectedly).
  • As a workaround, I re-instantiate a connection object before it's time to INSERT.

I'm not sure what causes this, but what's apparent to me is that living an idle for too long causes it to silently timeout/disconnect.

In your reply to @yshahak you've mentioned it might be indeed the server closing the client's connection; I don't believe that is the case. I believe the connection is already closed by that time.

@dvarrazzo
Copy link
Member

@leongold remember that you cannot share connections across processes. If you open a connection, fork, and execute it from the forked process, it won't work.

@leongold
Copy link

leongold commented Oct 10, 2019

@dvarrazzo each subprocess instantiates its own connection... which also leads me to believe subprocessing isn't relevant here... could it be there's simply an issue of idle connections silently disconnecting?

edit:

server logs has a bunch of:

2019-10-09 00:10:49.762 UTC [2439] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:49.762 UTC [2439] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:49.799 UTC [2436] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:49.799 UTC [2436] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.067 UTC [2447] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.067 UTC [2447] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.476 UTC [2445] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.476 UTC [2445] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.553 UTC [2441] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.553 UTC [2441] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.738 UTC [2448] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.738 UTC [2448] LOG:  unexpected EOF on client connection with an open transaction

edit 2:

another possible important tidbit: my architecture was working flawlessly for months; this started to occur when the time period between connection instantiation to first INSERT grew by a couple of minutes.

and, as aformentioned, simply creating a new connection before INSERT'ing works around it.

@rahulhacker
Copy link

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can someone help me out on this?

@NayangiriGoswamiGit
Copy link

This bug is about an issue for which on python 3 processes closed each other's connections. Maybe I'm facing this issue while uploading bulk data using multiprocessing library. I'm on psycopg2 version 2.8. is any one knows the solution here ?

@AndyMender
Copy link

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can someone help me out on this?

I encountered this while using my SQLAlchemy connector wrappers. The high-level cause was too much data in INSERT queries.

Since one typically doesn't have access to SQLAlchemy internals, what @dimandzhi recommended with SELECT 1 pings is a good idea. Or chunk your data into smaller blocks. It will be slower (more database calls), but more resistant to connection disruption of any sort.

@dvarrazzo
Copy link
Member

Or just to your server being configured with a statement_timeout you are hitting

@psycopg psycopg locked as resolved and limited conversation to collaborators Feb 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants